import pandas as pd
import numpy as np
import geopandas as gpd
Data Prepartion This lab will load three dataset: The Seattle Police Department Beats, Seattle Neighorhoods, and current crime data in Seattle.
Here are a definition for each dataset and a link for more details.
[Seattle Police Department Beats]: The Seattle Police Department has different iterations of police beats as the city's needs evolved. This layer represents what the beats looked like from 2017 to present. [Link] (https://data-seattlecitygis.opendata.arcgis.com/datasets/SeattleCityGIS::seattle-police-department-beats-web-mercator/about) [Seattle Neighborhoods]: -Neighborhood map atlas neighborhood areas are derived from the Seattle City Clerk's Office Neighborhood Map Atlas. These are the smallest areas and have been supplemented with alternate names from other sources. They roll up to the district areas. (https://data-seattlecitygis.opendata.arcgis.com/datasets/SeattleCityGIS::neighborhood-map-atlas-neighborhoods/about Links to an external site.) [SPD Crime Data 2008 - Present]: Seattle Police Department stored the crime data from 2008 to current in the National Incident-Based Reporting System (NIBRS). The data includes report date time, block address, latitude, longitude, offense sub category, and more. (https://data.seattle.gov/Public-Safety/SPD-Crime-Data-2008-Present/tazs-3rd5/about_data Links to an external site.) The crime data are updated by having more records every time, yet beats and neighborhoods data are likely constant over time. It means that the crime data from 2008 to current is huge and the API endpoint returns a limited number of records. In my experience, this crime data API endpoint returns about 2000 records. Due to the limited number of records pull, we will query crime data by date to only returns the recent records. Otherwise, it will return random records in the dataset and it would not be meaningful to analyze them.
We will use the API endpoint to GeoJson to load Beats and Neighborhoods.
beats=gpd.read_file("https://services.arcgis.com/ZOyb2t4B0UYuYNYH/arcgis/rest/services/Current_Beats/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")
nbh=gpd.read_file("https://services.arcgis.com/ZOyb2t4B0UYuYNYH/arcgis/rest/services/nma_nhoods_sub/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")
For the crime data, we will query data first and generate the API endpoint. Crime Data
url = "https://data.seattle.gov/resource/tazs-3rd5.json?$limit=2000&$where=report_date_time%20between%20%272025-11-06%27%20and%20%272025-11-11%27"
incidents = pd.read_json(url)
incidents.head()
| report_number | report_date_time | offense_id | offense_date | nibrs_group_a_b | nibrs_crime_against_category | offense_sub_category | shooting_type_group | block_address | latitude | longitude | beat | precinct | sector | neighborhood | reporting_area | offense_category | nibrs_offense_code_description | nibrs_offense_code | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2025-325462 | 2025-11-06 00:11:39 | 67248678577 | 2025-11-05T22:47:00.000 | A | PERSON | ASSAULT OFFENSES | - | 11XX BLOCK OF SUMMIT AVE | 47.61133751 | -122.32389709863 | E3 | East | E | FIRST HILL | 7623 | ALL OTHER | Intimidation | 13C |
| 1 | 2025-325470 | 2025-11-06 00:15:06 | 67217550137 | 2025-11-05T23:14:00.000 | B | SOCIETY | DISORDERLY CONDUCT & VAGRANCY VIOLATIONS | - | 15XX BLOCK OF 3RD AVE W | 47.63292666 | -122.361015838569 | Q2 | West | Q | QUEEN ANNE | 8623 | ALL OTHER | Curfew/Loitering/Vagrancy Violations | 90B |
| 2 | 2025-325470 | 2025-11-06 00:15:06 | 67217554931 | 2025-11-05T23:14:00.000 | B | ANY | ALL OTHER | - | 15XX BLOCK OF 3RD AVE W | 47.63292666 | -122.361015838569 | Q2 | West | Q | QUEEN ANNE | 8623 | ALL OTHER | All Other Offenses | 90Z |
| 3 | 2025-325470 | 2025-11-06 00:15:06 | 67217557249 | 2025-11-05T23:14:00.000 | A | SOCIETY | NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) | - | 15XX BLOCK OF 3RD AVE W | 47.63292666 | -122.361015838569 | Q2 | West | Q | QUEEN ANNE | 8623 | ALL OTHER | Drug/Narcotic Violations | 35A |
| 4 | 2025-325482 | 2025-11-06 00:55:17 | 67217736640 | 2025-11-05T23:58:00.000 | A | SOCIETY | NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) | - | 6TH AVE S / S DEARBORN ST | 47.59583728 | -122.326373794937 | K3 | West | K | CHINATOWN/INTERNATIONAL DISTRICT | 1499 | ALL OTHER | Drug/Narcotic Violations | 35A |
incidents['longitude'].describe()
count 952 unique 628 top REDACTED freq 163 Name: longitude, dtype: object
incidents['latitude'].describe()
count 952 unique 616 top REDACTED freq 163 Name: latitude, dtype: object
incidents2 = incidents[(incidents['longitude'] != 'REDACTED')]
incidents2 = incidents2.astype({'longitude': float, 'latitude':float})
# Geopandas.GeoDataFrame() method will conver the data to spatial form, Geo dataframe with points_from_xy method.
spd_crime=gpd.GeoDataFrame(incidents2, geometry=gpd.points_from_xy(incidents2.longitude, incidents2.latitude), crs='EPSG:4326')
spd_crime.plot()
<Axes: >
spd_crime['longitude'].describe()
count 789.000000 mean -122.022878 std 6.104854 min -122.408242 25% -122.347592 50% -122.327014 75% -122.313407 max -1.000000 Name: longitude, dtype: float64
spd_crime2 = spd_crime[(spd_crime['longitude'] != -1)]
spd_crime2.plot()
<Axes: >
spd_crime3 = spd_crime2[[ 'report_number','report_date_time','offense_id','offense_category' ,'latitude', 'longitude','geometry']]
spd_crime3
| report_number | report_date_time | offense_id | offense_category | latitude | longitude | geometry | |
|---|---|---|---|---|---|---|---|
| 0 | 2025-325462 | 2025-11-06 00:11:39 | 67248678577 | ALL OTHER | 47.611338 | -122.323897 | POINT (-122.3239 47.61134) |
| 1 | 2025-325470 | 2025-11-06 00:15:06 | 67217550137 | ALL OTHER | 47.632927 | -122.361016 | POINT (-122.36102 47.63293) |
| 2 | 2025-325470 | 2025-11-06 00:15:06 | 67217554931 | ALL OTHER | 47.632927 | -122.361016 | POINT (-122.36102 47.63293) |
| 3 | 2025-325470 | 2025-11-06 00:15:06 | 67217557249 | ALL OTHER | 47.632927 | -122.361016 | POINT (-122.36102 47.63293) |
| 4 | 2025-325482 | 2025-11-06 00:55:17 | 67217736640 | ALL OTHER | 47.595837 | -122.326374 | POINT (-122.32637 47.59584) |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 945 | 2025-330622 | 2025-11-10 22:58:50 | 67307792849 | ALL OTHER | 47.690621 | -122.374103 | POINT (-122.3741 47.69062) |
| 946 | 2025-330622 | 2025-11-10 22:58:50 | 67306450951 | ALL OTHER | 47.690621 | -122.374103 | POINT (-122.3741 47.69062) |
| 948 | 2025-330645 | 2025-11-10 23:23:05 | 67306954301 | ALL OTHER | 47.613375 | -122.346513 | POINT (-122.34651 47.61338) |
| 950 | 2025-330661 | 2025-11-10 23:40:38 | 67307191880 | PROPERTY CRIME | 47.647728 | -122.400922 | POINT (-122.40092 47.64773) |
| 951 | 2025-330610 | 2025-11-10 23:41:46 | 67307019385 | PROPERTY CRIME | 47.609604 | -122.329221 | POINT (-122.32922 47.6096) |
787 rows × 7 columns
import folium
map = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
map
map = folium.Map(location=[47.6577792,-122.3031197], tiles="CartoDB Positron", zoom_start=10)
map
We can pass GeoJson object with folium.GeoJson() method and a cool feature of geopandas is that we can pass geodataframe as geojson in Folium to make a map. The following code will create a Seattle neighborhood map with popup displaying a value in the 'L_HOOD' column. Note that the following code makes a map object first, it makes a layer with the nbh data, and it is added using add_to() method. Lastly, when map1 is called, map1 is displaed as an output.
map1 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
folium.GeoJson(nbh,
popup=folium.GeoJsonPopup(fields=['L_HOOD'])
).add_to(map1)
map1
map1.save("index.html")
map2 displaying the SPD beats data with popup with a beat name.
map2 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
folium.GeoJson(beats,
popup=folium.GeoJsonPopup(fields=['beat'])
).add_to(map2)
map2
spd_crime3['report_date_time'] = spd_crime3['report_date_time'].dt.strftime('%Y-%m-%d %H:%M:%S')
map3 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
folium.GeoJson(spd_crime3).add_to(map3)
map3
/opt/conda/lib/python3.11/site-packages/geopandas/geodataframe.py:1968: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
Incidents by category¶
Now we will classify data by offense category to map the crime reports by its category. We will first find a list of offense type and next, assign a color to each offense type in the list.
offense_type = spd_crime3.offense_category.unique().tolist()
offense_type
['ALL OTHER', 'PROPERTY CRIME', 'VIOLENT CRIME']
colors = {'PROPERTY CRIME':'red', 'VIOLENT CRIME': 'darkpurple', 'ALL OTHER':'lightblue'}
Now we have three types of the reported offense and assigned color codes to the 'color' object. The following code iterates records in the geodataframe, check the offense category, and assign a color in colors to a marker if a value in the spd_crime3 record equals to a key value in colors.
map4 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
# Add markers with colors based on category
for index, row in spd_crime3.iterrows():
folium.Marker(
location=[row['latitude'], row['longitude']],
popup=row['offense_category'],
icon=folium.Icon(color=colors.get(row['offense_category'], 'gray')) # Default to gray if category not found
).add_to(map4)
map4
First, let's see incidents from today. The following code makes an object holding today's date value. Working with datetime data type is a bit tricky as we need to think of a standard time and local time (time zone). 'today' will returns today's date but you will see that is not a local time. Let's convert it to a local datetime information. You need to set the time with utc=True to make the time information aware the time zone. Next, tz_convert('America/Los_Angeles') converts it to the time value of Seattle. date() returns the value in date.
curr_date = pd.to_datetime('today', utc=True).tz_convert('America/Los_Angeles').date()
curr_date
datetime.date(2025, 12, 1)
See the reported incidents today with the next code line. In my case, it does not return any records. It could be because no incidents or no report yet.
incidents_today = spd_crime3[pd.to_datetime(spd_crime3['report_date_time']).dt.date == curr_date]
incidents_today
| report_number | report_date_time | offense_id | offense_category | latitude | longitude | geometry |
|---|
map5 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
folium.GeoJson(incidents_today).add_to(map5)
map5
How about incidents reported yesterday? To compute the dates, we use Timedelta. Timedeltas are absolute differences in times, expressed in difference units (e.g. days, hours, minutes, seconds). days=1 in pd.Timedelta method indicates one day duration and when the curr_date is subtracted by pd.Timedelta(days=1), then the date value becomes one in yesterday.
incidents_yesterday = spd_crime3[pd.to_datetime(spd_crime3['report_date_time']).dt.date == curr_date - pd.Timedelta(days=1)]
incidents_yesterday
| report_number | report_date_time | offense_id | offense_category | latitude | longitude | geometry |
|---|
from datetime import date
Incidents by Neighborhood by date¶
Here, we will take a look at the total number of incident distribution by dates and examine its spatial distribution of the date having the maximum number of incidents. The process is
- Identify the incident event distribution by date by grouping incidents by date and counting the number of incident events in each group.
- From the distribution, find a date with the highest number of incident events.
spd_crime3.loc[:, 'report_date_time'] = pd.to_datetime(spd_crime3['report_date_time']).dt.date
daily_crime_counts = spd_crime3.groupby(['report_date_time'])['offense_id'].count().reset_index()
daily_crime_counts
| report_date_time | offense_id | |
|---|---|---|
| 0 | 2025-11-06 | 192 |
| 1 | 2025-11-07 | 114 |
| 2 | 2025-11-08 | 144 |
| 3 | 2025-11-09 | 175 |
| 4 | 2025-11-10 | 162 |
daily_crime_counts = daily_crime_counts.rename(columns={"offense_id":"n_incidents"})
daily_crime_counts.plot.bar(x='report_date_time', y='n_incidents')
<Axes: xlabel='report_date_time'>
From the output dataframe and the bar graph, the date of Nov. 06, 2025 has the highest incident event. Now let's make a Choropleth map to see the spatial distribution of the events on the date.
crime_mmddyy=spd_crime3[pd.to_datetime(spd_crime3['report_date_time']).dt.date == date(2025, 11, 6)]
crime_nbh_mmdd = nbh.sjoin(crime_mmddyy, how='left', predicate='intersects')
crime_nbh_mmdd = crime_nbh_mmdd.dissolve(by='S_HOOD', aggfunc='count').reset_index()
crime_nbh_mmdd = crime_nbh_mmdd[['S_HOOD','geometry','OBJECTID']].rename(columns={"OBJECTID":"n_incidents"})
crime_nbh_mmdd
| S_HOOD | geometry | n_incidents | |
|---|---|---|---|
| 0 | Alki | POLYGON ((-122.38153 47.5895, -122.38182 47.58... | 1 |
| 1 | Arbor Heights | POLYGON ((-122.3769 47.51749, -122.376 47.5174... | 1 |
| 2 | Atlantic | POLYGON ((-122.30837 47.60167, -122.30401 47.6... | 7 |
| 3 | Ballard | POLYGON ((-122.37621 47.6671, -122.37622 47.66... | 3 |
| 4 | Belltown | POLYGON ((-122.34264 47.61637, -122.33874 47.6... | 6 |
| ... | ... | ... | ... |
| 89 | West Woodland | POLYGON ((-122.37559 47.67593, -122.37329 47.6... | 6 |
| 90 | Westlake | POLYGON ((-122.34354 47.62802, -122.34354 47.6... | 1 |
| 91 | Whittier Heights | POLYGON ((-122.37634 47.67592, -122.37652 47.6... | 1 |
| 92 | Windermere | POLYGON ((-122.27954 47.67578, -122.27736 47.6... | 1 |
| 93 | Yesler Terrace | POLYGON ((-122.31678 47.60612, -122.31678 47.6... | 2 |
94 rows × 3 columns
from branca.colormap import linear
colormap = linear.PuRd_05.scale(
crime_nbh_mmdd.n_incidents.min(), crime_nbh_mmdd.n_incidents.max()
)
data_dict = crime_nbh_mmdd.set_index("S_HOOD")["n_incidents"]
map8 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=11)
folium.GeoJson(crime_nbh_mmdd,
name="incidents",
style_function=lambda feature: {
"fillColor": colormap(data_dict[feature['properties']["S_HOOD"]]),
"color": "black",
"weight": 1,
"dashArray": "5, 5",
"fillOpacity": 0.9,
},
popup=folium.GeoJsonPopup(fields=['S_HOOD','n_incidents'])
).add_to(map8)
folium.LayerControl().add_to(map8)
colormap.add_to(map8)
map8
Buffers¶
A buffer is the most commonly used proximity analysis. It creates an area around a given feature (point, line, or polygon) and a given distance. When we assign a distance, we have to be careful as the distance unit is euclidean, yet the point data is in WGS 4326 projection. Remind that the point data was generated with longitude and Latitude. The base unit for WGS4326 is degree and it does not match with the euclidean distanct unit. We want to reproject the point layer first in a projection with euclidean unit and assign the buffer distance. This example reproject the incident layer in EPSG 32610, which is WGS84 datum and UTM zone 10N projection.
First I will make point data for the Pioneer Square location with lat lon. I got these coordinates from Open Street Map as this is a default basemap in Folium. Then set its CRS in EPSG:32610 and make a buffer of 500 meter around the Pioneer Square point. Next, the buffer is reprojected again to EPSG 4326 with to_crs() to continue geoprocess with the dataset in this practice.
from shapely.geometry import Point
ps_loc = Point(-122.3313539, 47.6027217)
ps = gpd.GeoDataFrame(geometry=[ps_loc], crs=4326)
ps_utm = ps.to_crs('epsg:32610')
ps_buffer = ps_utm.buffer(500)
ps_buffer_4326 = ps_buffer.to_crs('epsg:4326')
ps_buffer_4326
0 POLYGON ((-122.3247 47.60268, -122.32474 47.60... dtype: geometry
map9 = folium.Map(location=[47.6008863,-122.3377], zoom_start=14)
folium.GeoJson(
data=ps_buffer_4326.to_json(),
).add_to(map9)
folium.Marker(
location=[47.6027217,-122.3313539],
).add_to(map9)
map9
The buffer polygon was created succefully! We are ready to find the incidents within 500 meter from Pioneer Square. The computation happens with geopandas overlay interaction method. I will put the buffer geometry to geodataframe. Then it will be added to overlay function as the geometry limit to filter the incidents.
ps_buffer_4326 = gpd.GeoDataFrame(geometry=ps_buffer_4326)
crime_ps = spd_crime3.overlay(ps_buffer_4326, how='intersection')
Interactive Widgets¶
Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc. You can use widgets to build interactive GUIs for your notebooks or to synchronize stateful and stateless information between Python and JavaScript.
Jupyter Widgets is primarily a framework to provide interactive controls. The ipywidgets package also provides a basic, lightweight set of core form controls that use this framework. These included controls include a text area, text box, select and multiselect controls, checkbox, sliders, tab panels, grid layout, etc.
The framework for building rich interactive objects is the foremost purpose of the Jupyter Widgets project, and the set of included core form controls is purposefully kept small and self-contained. We encourage and support a robust ecosystem of packages built on top of the Jupyter Widgets framework to provide more complicated interactive objects, such as maps, 2d and 3d visualizations, or other form control systems built on a variety of popular Javascript frameworks such as Material or Vue.
We have filtered the incident data by location, by polygon, by date, and by an area with the proximity. Here, we will make an interactive map that allows a user can examine the incident data by a neighborhood.
import ipywidgets as widgets
from IPython.display import display
spd_crime3['report_date_time'] = spd_crime3['report_date_time'].astype('str')
map11 = folium.Map(location=[47.6008863,-122.3377], zoom_start=10)
out = widgets.Output(layout={'border': '1px solid black'})
# Create a Dropdown widget
dropdown = widgets.Dropdown(
options=nbh['S_HOOD'].values.tolist(),
value=nbh['S_HOOD'].values[0], # Default selected value
description='Select',
disabled=False
)
def on_dropdown_change(change):
out.clear_output()
with out:
map11_n = folium.Map(location=[47.6008863,-122.3377], zoom_start=10)
my_gdf = spd_crime3.overlay(nbh[nbh['S_HOOD']==dropdown.value], how='intersection')
layer1 = folium.GeoJson(my_gdf).add_to(map11_n)
n_mygdf = len(my_gdf)
print ("The number of crime at "+ dropdown.value + f" is {n_mygdf}")
display(map11_n)
dropdown.observe(on_dropdown_change, names='value')
display(dropdown)
with out:
display(map11)
out
/opt/conda/lib/python3.11/site-packages/geopandas/geodataframe.py:1968: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
Dropdown(description='Select', options=('Loyal Heights', 'Ballard', 'Whittier Heights', 'West Woodland', 'Phin…
Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…
Autoplay Heatmap with Timestamp¶
spd_crime3['report_date_time'] = pd.to_datetime(spd_crime3['report_date_time']).sort_values(ascending=True).dt.date
data = []
for _, d in spd_crime3.groupby('report_date_time'):
data.append([[row['latitude'], row['longitude']] for _, row in d.iterrows()])
/opt/conda/lib/python3.11/site-packages/geopandas/geodataframe.py:1968: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
from datetime import datetime, timedelta
time_index = [
(min(spd_crime3['report_date_time']) + k * timedelta(1)).strftime("%Y-%m-%d") for k in range(len(data))
]
from folium.plugins import HeatMapWithTime
map12 = folium.Map(location=[47.6008863,-122.3377], zoom_start=10)
hm = folium.plugins.HeatMapWithTime(data, index=time_index, auto_play=True, max_opacity=0.3)
hm.add_to(map12)
display(map12)
Now we will query the incidents data for two different 5-day periods this year. I will create a map for each and compare the number of crimes during the time period.
- Which period had the highest number of total crimes?
- For each time period: which beat shows the highest number of violent crimes?
- For each time period: How many properties crime happens on the second last day of the dataset I queried?
url1 = (
"https://data.seattle.gov/resource/tazs-3rd5.json"
"?$limit=2000"
"&$where=report_date_time%20between%20'2025-11-06'%20and%20'2025-11-11'"
)
incidents1 = pd.read_json(url1)
url2 = (
"https://data.seattle.gov/resource/tazs-3rd5.json"
"?$limit=2000"
"&$where=report_date_time%20between%20'2025-10-01'%20and%20'2025-10-05'"
)
incidents2 = pd.read_json(url2)
Total number of crimes in each period
total_p1 = len(incidents1)
total_p2 = len(incidents2)
total_p1, total_p2
(952, 704)
Beat with highest number of violent crimes in each period
def beat_with_most_violent(df):
violent = df[df["nibrs_crime_against_category"] == "PERSON"]
by_beat = violent.groupby("beat")["offense_id"].count().sort_values(ascending=False)
return by_beat.head(1) # returns the top beat and count
violent_p1 = beat_with_most_violent(incidents1)
violent_p2 = beat_with_most_violent(incidents2)
violent_p1, violent_p2
(beat K3 11 Name: offense_id, dtype: int64, beat D1 9 Name: offense_id, dtype: int64)
Property crimes on the second-last day of each period
def property_on_second_last_day(df):
# make a date column
df = df.copy()
df["report_date"] = pd.to_datetime(df["report_date_time"]).dt.date
# keep only property crimes
prop = df[df["nibrs_crime_against_category"] == "PROPERTY"]
# sorted unique dates within this subset
unique_dates = sorted(prop["report_date"].unique())
if len(unique_dates) < 2:
return None # not enough days with property crimes
second_last_day = unique_dates[-2]
# count property crimes on that day
count = (prop["report_date"] == second_last_day).sum()
return second_last_day, count
prop_p1 = property_on_second_last_day(incidents1)
prop_p2 = property_on_second_last_day(incidents2)
prop_p1, prop_p2
((datetime.date(2025, 11, 9), 136), (datetime.date(2025, 10, 3), 98))
Now I will analyze one of the datasets from the first part. Uncovering what incident type was most common over the time period I queried and also In which neighborhood was that incident type most prevalent.
Most common incident type during the 5-day period¶
common_type = incidents['offense_category'].value_counts().idxmax()
count_common = incidents['offense_category'].value_counts().max()
common_type, count_common
('ALL OTHER', 456)
Neighborhood where that incident was most prevalent¶
subset = incidents[incidents['offense_category'] == common_type]
neigh_counts = subset['neighborhood'].value_counts()
top_neigh = neigh_counts.idxmax()
top_neigh_count = neigh_counts.max()
top_neigh, top_neigh_count
('CAPITOL HILL', 55)
Next, I will create a 750-meter buffer around University of Washington. I will use a University of Washington neighborhood polygon from the neighborhood dataset to make a buffer. And then analyze the crime incidents within the buffer zone:
- Create a bar plot to show the number of incidents per day in the time period I queried.
- Create a folium map which displays the buffer boundary and all crime incidents within the buffer, with different colored markers for each crime category.
Firstly, load neighborhood polygon + extract UW polygon + buffer 750 meters
# Load Seattle neighborhood polygons
neigh = gpd.read_file("Neighborhood.geojson")
# Filter all UW-related polygons using S_HOOD
uw_poly = neigh[neigh["S_HOOD"].str.contains("University", case=False, na=False)]
# Reproject to Web Mercator (meters)
uw_poly_3857 = uw_poly.to_crs(epsg=3857)
# Merge polygons + create 750 m buffer
uw_buffer_geom = uw_poly_3857.unary_union.buffer(750)
# Convert buffer back to EPSG:4326 for Folium
uw_buffer_gdf = gpd.GeoDataFrame(geometry=[uw_buffer_geom], crs="EPSG:3857").to_crs(epsg=4326)
uw_buffer_gdf
/tmp/ipykernel_248/3046488903.py:11: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead. uw_buffer_geom = uw_poly_3857.unary_union.buffer(750)
| geometry | |
|---|---|
| 0 | POLYGON ((-122.3086 47.64313, -122.3087 47.643... |
Clean incident coordinates
# Clean REDACTED or invalid coordinates
incidents_clean = incidents.copy()
incidents_clean["latitude"] = pd.to_numeric(incidents_clean["latitude"], errors="coerce")
incidents_clean["longitude"] = pd.to_numeric(incidents_clean["longitude"], errors="coerce")
incidents_clean = incidents_clean.dropna(subset=["latitude", "longitude"])
# Convert to GeoDataFrame
inc_gdf = gpd.GeoDataFrame(
incidents_clean,
geometry=gpd.points_from_xy(incidents_clean["longitude"], incidents_clean["latitude"]),
crs="EPSG:4326"
).to_crs(epsg=3857)
# Pick incidents INSIDE the UW buffer
inc_in_buffer = inc_gdf[inc_gdf.within(uw_buffer_geom)].copy()
print("Incidents inside 750m UW buffer:", len(inc_in_buffer))
Incidents inside 750m UW buffer: 60
Bar Plot of Incidents Per Day
import matplotlib.pyplot as plt
# Extract dates
inc_in_buffer["date_only"] = pd.to_datetime(inc_in_buffer["report_date_time"]).dt.date
# Bar plot
plt.figure(figsize=(10,5))
inc_in_buffer.groupby("date_only")["offense_id"].count().plot(kind="bar")
plt.title("Incidents per Day within 750m of UW")
plt.xlabel("Date")
plt.ylabel("Incident Count")
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
Folium Map (Buffer + Colored Crime Markers)
# Base map centered on UW
m = folium.Map(location=[47.655, -122.303], zoom_start=14)
# Add buffer boundary
folium.GeoJson(
uw_buffer_gdf.to_json(),
style_function=lambda x: {"color": "blue", "weight": 2, "fillOpacity": 0}
).add_to(m)
# Reproject incidents for Folium (EPSG:4326)
inc_in_buffer_4326 = inc_in_buffer.to_crs(epsg=4326)
# Marker colors by category
color_map = {
"PROPERTY CRIME": "red",
"PERSON": "purple",
"ALL OTHER": "green"
}
# Add crime markers
for _, row in inc_in_buffer_4326.iterrows():
folium.CircleMarker(
location=[row.geometry.y, row.geometry.x],
radius=4,
color=color_map.get(row["offense_category"], "gray"),
fill=True,
fill_opacity=0.7,
popup=f"{row['offense_category']}<br>{row['report_number']}"
).add_to(m)
m
Analysis¶
Over the 5-day period, the bar plot of incidents within 750 meters of the University of Washington shows noticeable variation in daily counts, with one or two dates standing out as local peaks. This suggests that crime near campus is somewhat clustered in time, likely reflecting patterns in student presence, nightlife, and the timing of classes and activities. Days with higher incident counts may correspond to evenings with heavier foot traffic or busier periods around commercial areas. The folium map further contextualizes these temporal patterns spatially. Within the buffer, incidents are not evenly distributed; they cluster along the campus edges and nearby streets, especially around major arterials and mixed-use blocks. Property crimes appear most frequently and are widely scattered across the buffer, which is consistent with thefts and car prowls in a dense university neighborhood. Person-related incidents are fewer but tend to appear along main routes and near transit stops, where more interpersonal interactions occur. Overall, the analysis indicates that crime around UW is shaped by both daily activity rhythms and the built environment. These patterns can help inform targeted patrols, lighting improvements, and safety outreach in the areas where students and residents are most active.
Autoplay Heatmap with Timestamp¶
Picking a Date and Prepping the Data
df = incidents.copy()
# Clean coordinates (handles 'REDACTED' etc.)
df["latitude"] = pd.to_numeric(df["latitude"], errors="coerce")
df["longitude"] = pd.to_numeric(df["longitude"], errors="coerce")
df = df.dropna(subset=["latitude", "longitude"])
# 3. Make a pure date + hour column
df["report_date_time"] = pd.to_datetime(df["report_date_time"])
df["date_only"] = df["report_date_time"].dt.date
df["hour"] = df["report_date_time"].dt.hour
# The date with the most incidents:
date_counts = df["date_only"].value_counts()
chosen_date = date_counts.idxmax()
print("Using date:", chosen_date)
df_day = df[df["date_only"] == chosen_date].copy()
print("Incidents on chosen date:", len(df_day))
df_day.head()
Using date: 2025-11-06 Incidents on chosen date: 192
| report_number | report_date_time | offense_id | offense_date | nibrs_group_a_b | nibrs_crime_against_category | offense_sub_category | shooting_type_group | block_address | latitude | ... | beat | precinct | sector | neighborhood | reporting_area | offense_category | nibrs_offense_code_description | nibrs_offense_code | date_only | hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2025-325462 | 2025-11-06 00:11:39 | 67248678577 | 2025-11-05T22:47:00.000 | A | PERSON | ASSAULT OFFENSES | - | 11XX BLOCK OF SUMMIT AVE | 47.611338 | ... | E3 | East | E | FIRST HILL | 7623 | ALL OTHER | Intimidation | 13C | 2025-11-06 | 0 |
| 1 | 2025-325470 | 2025-11-06 00:15:06 | 67217550137 | 2025-11-05T23:14:00.000 | B | SOCIETY | DISORDERLY CONDUCT & VAGRANCY VIOLATIONS | - | 15XX BLOCK OF 3RD AVE W | 47.632927 | ... | Q2 | West | Q | QUEEN ANNE | 8623 | ALL OTHER | Curfew/Loitering/Vagrancy Violations | 90B | 2025-11-06 | 0 |
| 2 | 2025-325470 | 2025-11-06 00:15:06 | 67217554931 | 2025-11-05T23:14:00.000 | B | ANY | ALL OTHER | - | 15XX BLOCK OF 3RD AVE W | 47.632927 | ... | Q2 | West | Q | QUEEN ANNE | 8623 | ALL OTHER | All Other Offenses | 90Z | 2025-11-06 | 0 |
| 3 | 2025-325470 | 2025-11-06 00:15:06 | 67217557249 | 2025-11-05T23:14:00.000 | A | SOCIETY | NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) | - | 15XX BLOCK OF 3RD AVE W | 47.632927 | ... | Q2 | West | Q | QUEEN ANNE | 8623 | ALL OTHER | Drug/Narcotic Violations | 35A | 2025-11-06 | 0 |
| 4 | 2025-325482 | 2025-11-06 00:55:17 | 67217736640 | 2025-11-05T23:58:00.000 | A | SOCIETY | NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) | - | 6TH AVE S / S DEARBORN ST | 47.595837 | ... | K3 | West | K | CHINATOWN/INTERNATIONAL DISTRICT | 1499 | ALL OTHER | Drug/Narcotic Violations | 35A | 2025-11-06 | 0 |
5 rows × 21 columns
Build Hourly Time Slices for HeatMapWithTime
from collections import defaultdict
# Group coordinates by hour for the chosen date
hourly_points = defaultdict(list)
for _, row in df_day.iterrows():
h = row["hour"]
hourly_points[h].append([row["latitude"], row["longitude"]])
# Sort hours and build list in time order
hours_sorted = sorted(hourly_points.keys())
heat_data = [hourly_points[h] for h in hours_sorted]
print("Hours in this day:", hours_sorted)
Hours in this day: [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
Autoplay Heatmap with Timestamp Map¶
from folium.plugins import HeatMapWithTime
# Center map roughly on Seattle
m = folium.Map(location=[47.61, -122.33], zoom_start=12)
# Labels for each time step (show up in the time slider)
time_index = [f"{chosen_date} {h:02d}:00" for h in hours_sorted]
HeatMapWithTime(
data=heat_data,
index=time_index,
auto_play=True,
max_opacity=0.8,
radius=12,
use_local_extrema=False
).add_to(m)
m